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Abstract. There are numerous stochastic models for cancer risk for a given tissue. 
Many rely on the following two hypotheses. 

1. There is a hxed probability that a given cell division will eventually lead to a 
cancerous cell. 

2. Cell divisions are nefarious or not independently of each other. 

We show that recent data on cancer risk and number of stem divisions is consistent 
with hypotheses 1 and 2. 

Theoretical models 

Cancer has long been thought as being provoked by successive somatic mutations, see 
Knudson (2001) for a history of this hypothesis. A leading hypothesis in the appearance 
of cancer is that it starts with a single cell division gone wrong. Usually one assumes that 
as a result of a mutation during division a precancerous cell appears and then a cascade 
of subsequent mutations triggers the appearance of a cancerous cell. There are numerous 
probability models around this idea, see for instance Armitage and Doll (1954), Frank et 
al. (2003), Knudson (2001) and Schinazi (2006). It is believed that the number of stem 
cell divisions is particularly relevant to the appearance of cancer, see Cairns (2002). 

Whatever the details of the different probability models there seems to be two hxed 
hypotheses. They are: 

1. For a given tissue, at each stem cell division there is a hxed probability p that a 
new cell starts a process which eventually will yield a cancerous cell. 

2. Cell divisions are nefarious or not independently of each other. 

These hypotheses can be thought as randomness hypotheses for the appearance of 
cancer. In this note we are interested in testing hypotheses 1 and 2 and in estimating p. 

To be more precise consider a human tissue that undergoes D stem cells divisions over 
its lifetime. Under hypotheses 1 and 2 we have that the risk of cancer r for this tissue is 

r = P{ cancer ) = 1 — (1 — p)^. 

Depending on the details of the model p may be expressed exactly using mutation pro¬ 
babilities, see Schinazi (2014). 

Under hypotheses 1 and 2, pD is the expected number of times during the lifetime of 
this tissue that a cell division will eventually lead to a cancerous cell. That is, pD is the 


expected count of distinct cancers that affect a given tissue during its lifetime. It seems 
reasonable to assume that pD is much smaller than 1. Hence, we have the following 
approximation 

r ~ 1 — (1 — pD) = pD. 

That is, under hypotheses 1 and 2 one expects a linear relationship between r and D. 
We next test this hypothesis. 

The test 

We use the data provided by Tomasetti and Vogelstein (2015 a), see table SI in their 
supplemental material. Cancer risk r and total number of cell divisions D are given for 
31 cancers. Since we are interested in testing the randomness of the appearance of cancer 
we excluded 6 data points that predispose a tissue to cancer. Namely, we excluded 3 
inherited conditions (Colorectal adenocarcinoma with FAP, Colorectal adenocarcinoma 
with Lynch syndrome and Duodenum adenocarcinoma with FAP), two tissues infected 
with a cancer provoking virus (Head and neck squamous cell carcinoma with HPV-16, 
Hepatocellular carcinoma with HCV) and lung cancer for smokers. See also the discussion 
in Tomasetti and Vogelstein (2015 b). This leaves us with 25 data points. Here are the 
results: 

Pearson’s linear correlation between r and D is 0.67. 

Linear regression equation is 

r = 1.01 X 10“^^D + 0.01. 

We reject the null hypothesis p = 0 with a P-value < 0.001. 

Hence, an estimate of the parameter p is p = 1.01 x 10“^^. One expects p to be qnite 
small since it is the probability of eventual appearance of a cancer cell per division. 

Stability of the results 

Given that the nnmber of cell divisions D is of order 10^° and the cancer risk r is of 
order 10“^ one concern is the stability of our estimates. 

To check the stability of our results we excluded 5 data points at random from the 
25 data points and ran the computations. We ran the compntations 100 times on the 
remaining randomly selected 20 points. We got an average correlation of 0.69 and an 
average p of 1.1 x 10“^^. These averages are remarkably close to the estimates on the 
whole sample. Hence, these resnlts give some conhdence in the estimates we obtained. 

Conclusion 

The data is consistent with the idea that cancer has a large stochastic component. This 
idea is central to many of the theoretical models for cancer risk. The correlation between 
cancer risk and total number of stem cell divisions is snrprisingly high. The estimate for 
p seems consistent with the hypothesis that p is a product of small probabilities. 
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